2 Improving Table Compression with Combinatorial Optimization ∗
نویسندگان
چکیده
We study the problem of compressing massive tables within the partition-training paradigm introduced by Buchsbaum et al. [SODA’00], in which a table is partitioned by an off-line training procedure into disjoint intervals of columns, each of which is compressed separately by a standard, on-line compressor like gzip. We provide a new theory that unifies previous experimental observations on partitioning and heuristic observations on column permutation, all of which are used to improve compression rates. Based on the theory, we devise the first on-line training algorithms for table compression, which can be applied to individual files, not just continuously operating sources; and also a new, off-line training algorithm, based on a link to the asymmetric traveling salesman problem, which improves on prior work by rearranging columns prior to partitioning. We demonstrate these results experimentally. On various test files, the on-line algorithms provide 35–55% improvement over gzip with negligible slowdown; the off-line reordering provides up to 20% further improvement over partitioning alone. We also show that a variation of the table compression problem is MAX-SNP hard.
منابع مشابه
A Method for Increasing Compression Ratio of Palette Color Images. 1 Available From
We address the problem of pseudocolor image compression. Image values represent indices into a look up table (palette). Due to quantization, the neighbouring pixel values (indices) change too much. This deteriorates performance of both lossless and lossy image compression methods. We suggest a preprocessing phase that (a) analyses statistics of the adjacency relations of index values, (b) perfo...
متن کاملModel-Based Semantic Compression for Network-Data Tables
While a variety of lossy compression schemes have been developed for certain forms of digital data (e.g., images, audio, video), the area of lossy compression techniques for arbitrary data tables has been left relatively unexplored. Nevertheless, such techniques are clearly motivated by the ever-increasing data collection rates of modern enterprises and the need for effective, guaranteedquality...
متن کامل: A Model-Based Semantic Compression System for Massive Data Tables
While a variety of lossy compression schemes have been developed for certain forms of digital data (e.g., images, audio, video), the area of lossy compression techniques for arbitrary data tables has been left relatively unexplored. Nevertheless, such techniques are clearly motivated by the everincreasing data collection rates of modern enterprises and the need for effective, guaranteed-quality...
متن کاملInvisible Modification of the Palette Color Image Enhancing Lossless Compression
Our contribution relates to lossless compression of pseudo color images (images with a palette). The proposed method is a preprocessing step preceeding actual compression. Indices in the palette are semioptimally permuted during preprocessing. For actual image compression, our own nonlinear predictor based method is used 1 but the proposed invisible palette modiication is relevant to most of ot...
متن کاملOptimization of profit and customer satisfaction in combinatorial production and purchase model by genetic algorithm
Optimization of inventory costs is the most important goal in industries. But in many models, the constraints are considered simple and relaxed. Some actual constraints are to consider the combinatorial production and purchase models in multi-products environment. The purpose of this article is to improve the efficiency of inventory management and find the economic order quantity and economic p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002